Index Compression Using Fixed Binary Codewords

نویسندگان

  • Vo Ngoc Anh
  • Alistair Moffat
چکیده

Document retrieval and web search engines index large quantities of text. The static costs associated with storing the index can be traded against dynamic costs associated with using it during query evaluation. Typically, index representations that are effective and obtain good compression tend not to be efficient, in that they require more operations during query processing. In this paper we describe a scheme for compressing lists of integers as sequences of fixed binary codewords that has the twin benefits of being both effective and efficient. Experimental results are given on several large text collections to validate these claims.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Post-Processing Mechanism for Sequential Use of Static and Dynamic Enumerative Code

A bijection between a complete set of source words and a complete set of codewords defines a variable-to-variable length (VV) source code. Such code is used to parse sequentially a source sequence into codewords. In a naive parsing of a finite source sequence, the last incomplete source word requires a separate post-processing. However, if the sizes of the source and the code alphabet are the s...

متن کامل

An adaptive incremental LBG for vector quantization

This study presents a new vector quantization method that generates codewords incrementally. New codewords are inserted in regions of the input vector space where the distortion error is highest until the desired number of codewords (or a distortion error threshold) is achieved. Adoption of the adaptive distance function greatly increases the proposed method's performance. During the incrementa...

متن کامل

Spatial Image Watermarking by Error-Correction Coding in Gray Codes

In this paper, error-correction coding (ECC) in Gray codes is considered and its performance in the protecting of spatial image watermarks against lossy data compression is demonstrated. For this purpose, the differences between bit patterns of two Gray codewords are analyzed in detail. On the basis of the properties, a method for encoding watermark bits in the Gray codewords that represent sig...

متن کامل

On a Class of Constant Weight Codes

For any odd prime power q we first construct a certain non-linear binary code C(q, 2) having (q − q)/2 codewords of length q and weight (q − 1)/2 each, for which the Hamming distance between any two distinct codewords is in the range [q/2 − 3√q/2, q/2 + 3√q/2] that is, ‘almost constant’. Moreover, we prove that C(q, 2) is distance-invariant. Several variations and improvements on this theme are...

متن کامل

Robust Image and Video Coding with Pyramid Vector Quantisation

Most current image and video coding standards use variable length codes to achieve compression, which renders the compressed bitstream very sensitive to channel errors. In this paper, image and video coders based on Pyramid Vector Quantisation (PVQ) and using only fixed length codes are proposed. Still image coders using PVQ in conjunction with DCT and wavelet techniques are described and their...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004